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Vibrio harveyi strain E385 was isolated from a diseased cage-cultured grouper in Daya Bay, China. Phylogenetic analysis based 
on the 16S rRNA gene sequence showed similarity with V. harveyi strain BAA-1116. We sequenced the pathogenic strain V. har- 
veyi E385 and compared the genome with that of the nonpathogenic strain V. harveyi BAA-1 1 16. 
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1 lihrio harveyi is a Gram-negative halophilic bacterium that can 
be found swimming free in tropical marine waters as part of 
the resident microflora of marine animals. It has also been recog- 
nized as an opportunistic pathogen to many commercially farmed 
marine invertebrate and vertebrate species (1). V. harveyi strain 
E385, isolated from a diseased cage-cultured grouper (Epinephelus 
coioides) at the Daya Bay of China in 2009, has been identified as a 
member of a V. harveyi clade and shares a high degree of genetic 
similarity with V. harveyi strain BAA-1116. Importantly, the mi- 
croorganism is a virulent strain, based on the results of artificial 
infection tests. 

V. harveyi E385 was sequenced on the HiSeq 2000 sequencing 
platform with a paired-end 2 X 100-nucleotide (nt) procedure. A 
total of 24,892,942 paired-end reads were generated, with an av- 
erage length of 1 00 bp. After adapter and quality trimming of these 
data, there was about 2.4 Gb of filtered or clean data remaining, 
which yielded almost 400-fold coverage. De novo assembly was 
performed through the CLC Genomics Workbench (2) and Seq- 
Man (3), resulting in the production of 94 scaffolds ranging from 
551 bp to 765,364 bp. Excluding the gaps, the draft genome has a 
total of 6,354,192 bp, with a G+C content of 44.8%. 

Using the draft genome, protein-coding sequences were pre- 
dicted using the Glimmer 3.02 program (4), and 5,639 predicted 
open reading frames (ORFs) were obtained. These predicted 
ORFs were annotated by a BLASTx search against the SwissProt 
database with a cutoff E value of 10" 10 . In total, 3,652 ORFs were 
significantly matched by BLASTx hits, covering 64.8% of all pre- 
dicted ORFs. Furthermore, GO annotation was analyzed using the 
Blast2GO software (5, 6), by which GO terms were assigned to 
query sequences and catalogued groups were produced based on 
biological processes, molecular functions, and cellular compo- 
nents. In total, 2,857 ORFs were assigned with 14,313 GO terms. 
Within molecular functions, catalytic activity categories (GO: 
0003824) and binding (GO:0005488) were highly represented, ac- 
counting for 41.3% and 34.8% of the ORFs, respectively. Cells 
(GO:0005623; 45.7%) and cell parts (GO:0044464; 45.7%) were 



the most represented GO categories within the cellular compo- 
nents. As to biological processes, metabolic process (GO:0008 152; 
28.9%) was the most highly represented category, followed by 
cellular process (GO:0009987; 26.1%). Using KEGG analysis (7) 
with the bidirectional best hit (BBH) method, 114 pathways were 
mapped, including metabolic pathways (127 members, KEGG: 
koOHOO), followed by secondary metabolite biosynthesis (64 
members, KEGG:ko01110), microbial metabolism in diverse en- 
vironments (37 members, KEGG:ko01 120), ABC transporters (27 
members, KEGG:ko02010), two-component system (26 members, 
KEGG:ko02020), and flagellar assembly (15 members, KEGG: 
ko02040). To classify the possible functions of the genes, COG 
annotation was performed (8), and the results showed that 177 
and 99 of the predicted proteins fall into the U class (intracellular 
trafficking, secretion, and vesicular transport) and V class (de- 
fense mechanisms), respectively. Noncoding RNA genes were 
identified by tRNAscan-SE (9) and RNAmmer (10) and found 19 
16S-23S-5S rRNA and 105 predicted tRNA genes in the genome. 
Interestingly, we also found 85 toxin genes via an online predic- 
tion tool (see http://mvirdb.llnl.gov/) (11), with the parameters of 
the filter for BLAST results chosen using the default settings. These 
results emphasize its high virulence toward marine hosts. 

Nucleotide sequence accession numbers. This whole-genome 
shotgun project has been deposited at DDBJ/EMBL/GenBank un- 
der the accession no. AYKI00000000. The version described in this 
paper is version AYKI0 1000000. 
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